Skip to content

User defined network #748

Open
entlein wants to merge 4 commits intokubescape:mainfrom
k8sstormcenter:userdefNN
Open

User defined network #748
entlein wants to merge 4 commits intokubescape:mainfrom
k8sstormcenter:userdefNN

Conversation

@entlein
Copy link

@entlein entlein commented Mar 13, 2026

Overview

Introduces the concept of user-defined network neighboorhood. User can provided (next PR signed) NN and load it

Additional Information

tests are also fixing those CRDs fields that were out of sync with upstream. More tests are coming today ,e.g. a fake MITM test

How to Test

By running the new component tests, I didnt create unitests

Examples/Screenshots

It works on my fork (cough cough), this is an attempt of cherry picking it. Might need to wait until I have time to rebase my entire fork after KubeCon Here are my current tests https://github.com/k8sstormcenter/node-agent/actions/runs/23043455944/job/66926790811

-->

Summary by CodeRabbit

  • New Features

    • Added support for user-defined network neighborhoods with pod label configuration
    • Enabled DNS anomaly detection and unexpected egress traffic monitoring with alert triggering
  • Tests

    • Added comprehensive end-to-end test coverage for user-defined profiles and network neighborhoods
    • Included multiple deployment test resources for profile and network configuration scenarios

@coderabbitai
Copy link

coderabbitai bot commented Mar 13, 2026

📝 Walkthrough

Walkthrough

This change introduces user-defined network neighborhood support alongside existing user-defined profiles. It adds metadata key parsing for network definitions, propagates network data through container tracking, implements early-return paths to prevent auto-learning when user-defined networks exist, and includes comprehensive test coverage with new Kubernetes manifests and debug utilities.

Changes

Cohort / File(s) Summary
Container Lifecycle & Caching
pkg/containerprofilemanager/v1/lifecycle.go, pkg/objectcache/networkneighborhoodcache/networkneighborhoodcache.go, pkg/objectcache/shared_container_data.go
Core logic for user-defined network handling: early returns for host containers, metadata key parsing, propagation of UserDefinedNetwork through container info, workload-level detection, and prevention of auto-learning when user-defined networks are present.
Rule Configuration
tests/chart/templates/node-agent/default-rules.yaml
DNS Anomalies and Unexpected Egress Network Traffic rules now enabled with isTriggerAlert set to true.
Test Resources & Deployments
tests/resources/curl-*.yaml, tests/resources/nginx-*.yaml
New Kubernetes Deployment manifests for testing: curl and nginx variants with user-defined profiles, networks, and combinations; includes plain, user-network, user-profile, and wildcard configurations.
Test Profiles & NetworkNeighborhoods
tests/resources/known-application-profile*.yaml, tests/resources/known-network-neighborhood.yaml
ApplicationProfile and NetworkNeighborhood YAML resources defining access surfaces, syscalls, and egress endpoints for test scenarios, including wildcard path matching patterns.
Test Utilities & Scripts
tests/resources/debug-learn-nn.sh
Bash script to deploy curl workload, trigger learning via DNS/HTTP traffic, poll CR statuses, and output learned NetworkNeighborhood and ApplicationProfile configurations.
Component Tests
tests/component_test.go
Extensive end-to-end test scenarios: Test_27_ApplicationProfileOpens and Test_28_UserDefinedNetworkNeighborhood covering path matching (absolute/relative, exact/wildcard/ellipsis), profile deployment, traffic generation, and alert collection.

Sequence Diagram

sequenceDiagram
    actor Pod as Pod Metadata
    participant CM as ContainerManager
    participant CDC as SharedContainerData
    participant NNC as NetworkNeighborhoodCache
    participant Storage as Storage/DB
    
    Pod->>CDC: SetContainerInfo with kubescape.io/user-defined-network label
    CDC->>CDC: Parse UserDefinedNetworkMetadataKey from pod labels
    CDC->>CDC: Populate WatchedContainerData.UserDefinedNetwork
    
    CDC->>NNC: Propagate UserDefinedNetwork to ContainerInfo
    
    NNC->>NNC: Check if container has non-empty UserDefinedNetwork
    alt User-Defined Network Present
        NNC->>Storage: Load user-defined NetworkNeighborhood by name
        Storage-->>NNC: Return NetworkNeighborhood
        NNC->>NNC: Set workload state to Full/Completed
        NNC->>NNC: Skip auto-learning for this workload
    else No User-Defined Network
        NNC->>NNC: Proceed with standard learning flow
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 Hopping through networks, both defined and learned,
User-provided paths, how tables have turned!
No more auto-guessing where traffic should flow,
Neighborhoods crafted by those in the know.

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'User defined network' is vague and overly broad. It does not clearly convey the specific change being made in this pull request. Revise the title to be more specific and descriptive, such as 'Add support for user-defined network neighborhoods' or 'Enable user-defined NetworkNeighborhood profiles'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (2)
tests/resources/test-28-iterate.sh (1)

217-219: Add null check for jq to avoid parse errors.

If no alerts are returned, $ALERTS could be [] or contain unexpected data. The jq command on line 218 may fail silently with || true, masking issues.

Consider validating that ALERTS is a valid JSON array before processing:

Suggested improvement
 echo "  === All alerts in $NS ==="
-echo "$ALERTS" | jq -r '.[] | "    [\(.labels.rule_id)] \(.labels.rule_name) | comm=\(.labels.comm // "?")"' 2>/dev/null || true
+if [ "$(echo "$ALERTS" | jq 'type')" = '"array"' ] && [ "$ALERT_COUNT" -gt 0 ]; then
+  echo "$ALERTS" | jq -r '.[] | "    [\(.labels.rule_id)] \(.labels.rule_name) | comm=\(.labels.comm // "?")"'
+else
+  echo "    (no alerts)"
+fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/resources/test-28-iterate.sh` around lines 217 - 219, The jq pipeline
on the ALERTS variable can fail or mask errors; before running echo "$ALERTS" |
jq ..., validate ALERTS is a valid JSON array (e.g., use jq -e to test
type=="array" or parse-then-check) and only run the jq formatting when that
check passes; otherwise print a clear "no alerts" or skip formatting. Update the
logic around the ALERTS variable and the jq invocation (the echo "$ALERTS" | jq
-r '.[] | ...' block) to perform this validation and avoid silently swallowing
parse errors with || true.
tests/resources/known-network-neighborhood.yaml (1)

18-23: Inconsistent naming convention for user-managed NetworkNeighborhood.

The resource fusioncore-network has the annotation kubescape.io/managed-by: User but lacks the ug- prefix required by Kubescape's UserNetworkNeighborhoodPrefix convention. This means it won't be recognized as user-managed by the isUserManagedNN detection logic.

However, since this NN is loaded directly via the kubescape.io/user-defined-network pod label (which fetches by name without prefix checks), it works correctly for the Test_28 use case. If this resource is expected to be discovered or handled by user-managed NN detection elsewhere, consider renaming it to ug-fusioncore-network to follow the established convention.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/resources/known-network-neighborhood.yaml` around lines 18 - 23, The
metadata name "fusioncore-network" is missing the required "ug-" prefix used by
Kubescape's UserNetworkNeighborhoodPrefix and thus won't be recognized by
isUserManagedNN; rename the resource from "fusioncore-network" to
"ug-fusioncore-network" (or update the name wherever referenced by
kubescape.io/user-defined-network label) and ensure the kubescape.io/managed-by:
User annotation remains, so the isUserManagedNN detection and any lookups by the
kubescape.io/user-defined-network pod label will consistently identify this as a
user-managed NetworkNeighborhood.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/main.go`:
- Line 272: Remove the duplicate declaration of the variable cloudMetadata (the
earlier declaration already exists in the same block), so keep the original
declaration and delete the second one found in cmd/main.go; ensure all uses
reference the existing cloudMetadata variable and do not re-declare it (look for
any remaining "var cloudMetadata" occurrences and eliminate the redundant one).

In `@pkg/objectcache/networkneighborhoodcache/networkneighborhoodcache.go`:
- Around line 208-213: The code currently skips refreshing a named user-defined
network by returning early in the periodic sync when
workloadHasUserDefinedNetwork(workloadID) is true, making user-defined NNs
effectively write-once; change the logic so that addContainer still prefers
sharedData.UserDefinedNetwork on initial creation but the periodic refresh/Sync
routine (the function that calls GetNetworkNeighborhood and updates
workloadIDToProfileState) must always attempt to refresh the named network
object and retry transient failures rather than skipping updates entirely when
workloadHasUserDefinedNetwork(workloadID) is true; locate calls to
GetNetworkNeighborhood, the addContainer path, and the periodic sync code that
uses workloadHasUserDefinedNetwork and modify the sync path to fetch/update the
cached named NN (and retry on transient errors) while preserving the
initial-preference behavior in addContainer.

In `@tests/resources/known-application-profile-wildcards.yaml`:
- Around line 12-16: Add the same required metadata.annotations used to mark
user-defined profiles to this ApplicationProfile so it is detected as a
user-defined profile; update the metadata of the ApplicationProfile with name
"fusioncore-profile-wildcards" to include the same annotations present in
known-application-profile.yaml (i.e., add the user-defined profile annotation
keys/values under metadata.annotations so the resource is recognized by the
user-defined profile detection logic).

In `@tests/resources/known-application-profile.yaml`:
- Around line 13-17: The ApplicationProfile manifest for metadata.name
"fusioncore-profile" is missing the user-defined profile annotations; add an
annotations map under metadata with keys kubescape.io/managed-by: User,
kubescape.io/status: completed and kubescape.io/completion: complete so the
ApplicationProfile resource is recognized as a user-defined, completed profile
(update the metadata block in the ApplicationProfile declaration).

In `@tests/resources/nginx-user-defined-deployment.yaml`:
- Around line 1-22: The file name is misleading: the manifest creates a
Deployment named "curl-28" with container name "curl" and labels
kubescape.io/user-defined-network: curl-nn and
kubescape.io/user-defined-profile: curl-ap, so rename the file to match the
content (e.g., curl-user-defined-deployment.yaml) and update any references in
tests or manifests that import the old filename; alternatively, if nginx is
intended, replace the image and container name in the Deployment (metadata.name
"curl-28", spec.template.spec.containers[0].name "curl", and image) to an nginx
image to make filename accurate.

In `@tests/resources/run-test-28.sh`:
- Around line 73-81: The wait loop can exit without AP reaching "completed" and
later loose alert checks allow false positives; after the loop check AP_STATUS
and fail fast (exit non-zero) if it is not "completed", and replace loose alert
count checks with explicit assertions: validate the specific ApplicationProfile
identity/annotation (use the same kubectl jsonpath that sets AP_STATUS) and
assert the expected alert name/type (not just count > 0) by querying the alert
resource (e.g., the variable used for alert count) and matching the expected
alert fields; update both the AP wait block (AP_STATUS/NS/kubectl get
applicationprofiles) and the later alert-check section (lines around 97-113)
accordingly so the script aborts on timeout and only passes when the exact
profile and alert are present.

In `@tests/scripts/local-ci.sh`:
- Around line 178-187: The current --test-only branch (guarded by TEST_ONLY)
calls start_port_forwards and then runs run_tests in a loop but stops on the
first failing run_tests and/or only preserves the last test's exit status, so
failures earlier in the list are lost and print_logs may not be reached; change
the loop to always run every requested test in TESTS (or the single run when
TESTS is empty), collect/accumulate a failure flag or an aggregate exit code
(e.g., keep a variable like overall_status=0 and set overall_status=1 if any
run_tests returns non-zero), do not exit early on a failing run, ensure
print_logs is always invoked, and exit with overall_status at the end; apply the
same pattern to the analogous block around lines 203-210.

---

Nitpick comments:
In `@tests/resources/known-network-neighborhood.yaml`:
- Around line 18-23: The metadata name "fusioncore-network" is missing the
required "ug-" prefix used by Kubescape's UserNetworkNeighborhoodPrefix and thus
won't be recognized by isUserManagedNN; rename the resource from
"fusioncore-network" to "ug-fusioncore-network" (or update the name wherever
referenced by kubescape.io/user-defined-network label) and ensure the
kubescape.io/managed-by: User annotation remains, so the isUserManagedNN
detection and any lookups by the kubescape.io/user-defined-network pod label
will consistently identify this as a user-managed NetworkNeighborhood.

In `@tests/resources/test-28-iterate.sh`:
- Around line 217-219: The jq pipeline on the ALERTS variable can fail or mask
errors; before running echo "$ALERTS" | jq ..., validate ALERTS is a valid JSON
array (e.g., use jq -e to test type=="array" or parse-then-check) and only run
the jq formatting when that check passes; otherwise print a clear "no alerts" or
skip formatting. Update the logic around the ALERTS variable and the jq
invocation (the echo "$ALERTS" | jq -r '.[] | ...' block) to perform this
validation and avoid silently swallowing parse errors with || true.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5580f2ff-1092-4eff-bf45-5966b04f60de

📥 Commits

Reviewing files that changed from the base of the PR and between f673d45 and 7f06cc9.

📒 Files selected for processing (20)
  • cmd/main.go
  • pkg/containerprofilemanager/v1/lifecycle.go
  • pkg/objectcache/networkneighborhoodcache/networkneighborhoodcache.go
  • pkg/objectcache/shared_container_data.go
  • tests/chart/templates/node-agent/default-rules.yaml
  • tests/component_test.go
  • tests/resources/curl-plain-deployment.yaml
  • tests/resources/curl-user-network-deployment.yaml
  • tests/resources/curl-user-profile-wildcards-deployment.yaml
  • tests/resources/debug-learn-nn.sh
  • tests/resources/known-application-profile-wildcards.yaml
  • tests/resources/known-application-profile.yaml
  • tests/resources/known-network-neighborhood.yaml
  • tests/resources/nginx-both-user-defined-deployment.yaml
  • tests/resources/nginx-known-network-deployment.yaml
  • tests/resources/nginx-user-defined-deployment.yaml
  • tests/resources/nginx-user-network-deployment.yaml
  • tests/resources/run-test-28.sh
  • tests/resources/test-28-iterate.sh
  • tests/scripts/local-ci.sh

Comment on lines +208 to +213
// Never overwrite a user-defined network neighborhood with an
// auto-learned one. Check if any container for this workload
// has a user-defined-network label.
if nnc.workloadHasUserDefinedNetwork(workloadID) {
continue
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't make user-defined NNs write-once cache entries.

addContainer is the only place in this file that does GetNetworkNeighborhood(..., sharedData.UserDefinedNetwork). After that, Lines 208-213 and 768-778 only let the periodic sync skip the workload, not refresh the named object. That means a transient failure on Lines 441-452 is never retried here, and later edits to the user-defined NN also stay stale until the container restarts. It also runs after Lines 176-182 have already copied the auto-learned object's state into workloadIDToProfileState, so the reported state can diverge from the cached user-defined NN.

Also applies to: 436-470, 766-778

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/objectcache/networkneighborhoodcache/networkneighborhoodcache.go` around
lines 208 - 213, The code currently skips refreshing a named user-defined
network by returning early in the periodic sync when
workloadHasUserDefinedNetwork(workloadID) is true, making user-defined NNs
effectively write-once; change the logic so that addContainer still prefers
sharedData.UserDefinedNetwork on initial creation but the periodic refresh/Sync
routine (the function that calls GetNetworkNeighborhood and updates
workloadIDToProfileState) must always attempt to refresh the named network
object and retry transient failures rather than skipping updates entirely when
workloadHasUserDefinedNetwork(workloadID) is true; locate calls to
GetNetworkNeighborhood, the addContainer path, and the periodic sync code that
uses workloadHasUserDefinedNetwork and modify the sync path to fetch/update the
cached named NN (and retry on transient errors) while preserving the
initial-preference behavior in addContainer.

entlein and others added 2 commits March 13, 2026 11:26
* Strip unused SBOM fields to reduce object size by ~52%

Signed-off-by: Ben <ben@armosec.io>
Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

* use fixed StripSBOM from storage v0.0.247

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

* bump github.com/goradd/maps v1.3.0

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

* Refactor DatasourceEvent methods to handle missing fields gracefully and improve logging

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

* Simplify StructEvent methods by removing event type checks and improving consistency

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

* Remove kskubemanager dependency and replace with kubemanager in tracer_factory

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

* Improve logging for DatasourceEvent field access errors, fix error_raw type

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

* Pass alert platform to RuleFailureCreator

Update github.com/armosec/armoapi-go to v0.0.678.

Pass AlertSourcePlatform to the constructor instead of hardcoding
AlertSourcePlatformK8s. Set the value to AlertSourcePlatformK8sAgent.

* Remove LocalManager.host parameter from gadget execution to fix host detection

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

* fix http body parsing to respect Content-Length

Signed-off-by: Yakir Oren <yakiroren@gmail.com>

* Support for chunked
Signed-off-by: Ben <ben@armosec.io>

* Support for chunked also in response
Signed-off-by: Ben <ben@armosec.io>

* Don't log warning for missing proc enrichment in SyscallEventType DatasourceEvent methods

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

* Bump go.opentelemetry.io/otel/sdk from 1.39.0 to 1.40.0

Bumps [go.opentelemetry.io/otel/sdk](https://github.com/open-telemetry/opentelemetry-go) from 1.39.0 to 1.40.0.
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go@v1.39.0...v1.40.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/otel/sdk
  dependency-version: 1.40.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

* Refactor DatasourceEvent methods to suppress invalid field length warnings

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

* Fix: Add ContainerID to virtual host container

Ensure ready channel is closed only once

Co-authored-by: Cerebras Agent <193945191+isaact-cerebras@users.noreply.github.com>
Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

* Skip containers with empty image name

Return early when ContainerImageName is empty and log debug details
including namespace, pod, container name, and container ID.

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

* add buf_len field to HTTP gadget to fix ring buffer data bleed

Signed-off-by: Yakir Oren <yakiroren@gmail.com>

* add buf_len to TestHttpFields expected fields

Signed-off-by: Yakir Oren <yakiroren@gmail.com>

* add GetValidBuf edge case tests for buf_len handling

Signed-off-by: Yakir Oren <yakiroren@gmail.com>

* Set ContainerID for Host context before NodeName

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

* Skip shared data wait for host containers in RuleManager

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

* Refactor: Introduce IsHostContainer utility function and update checks in RuleManager and ContainerWatcher

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

* Fix: Skip processing for host containers in ContainerCallback methods

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

* Fix: Skip processing for host containers in network stream events

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

* Fix: Use utils.HostContainerID for host container identification

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

* Fix: Add AlertSourcePlatform to runtime alerts in http_exporter

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

* Fix: Add AlertSourcePlatform parameter to InitExporters and update related functions

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

* Fix: Skip processing profile metadata for host containers in creator.go

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

* Fix: Ensure ready channel is closed and container entry is removed on errors and timeouts in lifecycle.go

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

* Fix: Update armoapi-go dependency to v0.0.682 and refactor related imports

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

* Fix: Normalize host container ID to nodeName in network stream handling

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

---------

Signed-off-by: Ben <ben@armosec.io>
Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>
Signed-off-by: Yakir Oren <yakiroren@gmail.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: entlein <einentlein@gmail.com>
Co-authored-by: Ben <ben@armosec.io>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Matthias Bertschy <matthias.bertschy@gmail.com>
Co-authored-by: Yakir Oren <yakiroren@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Cerebras Agent <193945191+isaact-cerebras@users.noreply.github.com>
Signed-off-by: entlein <einentlein@gmail.com>
Add support for user-defined NetworkNeighborhood profiles that allow
pre-configured network policies without requiring a learning phase.

Changes:
- Load user-defined NN in container lifecycle (lifecycle.go)
- NN cache support for user-defined profiles (networkneighborhoodcache.go)
- SharedContainerData: add UserDefinedNetwork field
- Enable R0005 (DNS Anomalies) and R0011 (Unexpected Egress) alert triggers
- Component tests: Test_27 (wildcard AP matching) and Test_28 (user-defined NN alerts)
- Test resources: known profiles, deployment YAMLs, helper scripts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: entlein <einentlein@gmail.com>
entlein added 2 commits March 13, 2026 11:30
Signed-off-by: entlein <einentlein@gmail.com>
Signed-off-by: entlein <einentlein@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

♻️ Duplicate comments (1)
pkg/objectcache/networkneighborhoodcache/networkneighborhoodcache.go (1)

208-213: ⚠️ Potential issue | 🟠 Major

Refresh named user-defined NNs instead of skipping them here.

This continue makes user-defined NetworkNeighborhoods effectively write-once cache entries. If the initial GetNetworkNeighborhood on Lines 438-452 fails, there is no retry path; later edits to the named NN also stay stale until the workload restarts. This loop also overwrites workloadIDToProfileState from the auto-learned object before skipping, so state can diverge from the cached user-defined NN.

Also applies to: 436-452

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/objectcache/networkneighborhoodcache/networkneighborhoodcache.go` around
lines 208 - 213, The loop currently skips user-defined NetworkNeighborhoods via
the continue in the nnc.workloadHasUserDefinedNetwork(workloadID) branch,
causing write-once behavior and stale workloadIDToProfileState; remove the early
continue and instead, when nnc.workloadHasUserDefinedNetwork(workloadID) is
true, call nnc.GetNetworkNeighborhood(nameOrID) to refresh/read the
authoritative user-defined NN and use that result to update
workloadIDToProfileState (avoiding overwriting it with the auto-learned object),
and only skip applying auto-learned values if the fetched user-defined NN
exists; ensure the check happens before assigning the auto-learned NN into
workloadIDToProfileState so state stays consistent with the user-defined cache.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/component_test.go`:
- Around line 1681-1690: The hasAlert helper (hasAlert(alerts
[]testutils.Alert)) is too coarse for R0002 wildcard subtests because it only
matches rule_name, comm and container_name; update the test to either (A) filter
alerts by the actual accessed path exposed on the alert (e.g., check
alert.Labels["path"] or alert.Annotations["path"] together with
ruleName/comm/container_name inside hasAlert) so each subtest asserts the
specific opened file, or (B) seed every fixture with the same baseline opens
(the startup files such as /etc/ld.so.cache) before asserting wildcard behavior
so unrelated startup opens don’t affect the boolean result; modify the hasAlert
usage and the fixtures accordingly to ensure each subtest validates the exact
path matcher rather than collapsing to a single boolean.

In `@tests/resources/curl-user-network-deployment.yaml`:
- Around line 3-6: The deployment's metadata.name "curl-fusioncore-deployment"
collides with the one in the other manifest; update the metadata.name in this
manifest from "curl-fusioncore-deployment" to "curl-user-network-deployment" so
it matches the filename and avoids overwriting the other Deployment when applied
(verify you only change the metadata.name field and leave the app label
"curl-fusioncore-28-1" unchanged).

In `@tests/resources/known-application-profile.yaml`:
- Around line 15-17: The metadata block for the ApplicationProfile
"fusioncore-profile" is missing the annotations used by
Test_28_UserDefinedNetworkNeighborhood and known-network-neighborhood.yaml;
update the metadata for the resource named fusioncore-profile to include an
annotations section with keys kubescape.io/managed-by: User,
kubescape.io/status: completed, and kubescape.io/completion: complete (insert
the annotations block immediately after namespace in the metadata for the
ApplicationProfile manifest so the node-agent will recognize this user-defined
profile).

In `@tests/resources/known-network-neighborhood.yaml`:
- Around line 31-33: The YAML uses spec.matchLabels but the API expects
spec.labelSelector.matchLabels; change the spec block to nest matchLabels under
labelSelector so it matches v1beta1.NetworkNeighborhoodSpec (i.e., use
labelSelector: with MatchLabels like in metav1.LabelSelector) ensuring the field
name matches the LabelSelector structure used by the code that constructs
v1beta1.NetworkNeighborhoodSpec.

In `@tests/resources/nginx-both-user-defined-deployment.yaml`:
- Around line 3-6: Deployment name collision: change the metadata.name value
currently set to "nginx-fusioncore-deployment" in
nginx-both-user-defined-deployment.yaml to a unique, filename-matching name such
as "nginx-both-user-defined-deployment"; update any cross-resource references
(e.g., selectors, Service targetNames, tests or manifests that expect the old
name) to use the new name so labels/selectors remain consistent (check
metadata.name and any occurrences of "nginx-fusioncore-deployment").

In `@tests/resources/nginx-known-network-deployment.yaml`:
- Around line 3-6: The deployment in
tests/resources/nginx-known-network-deployment.yaml uses metadata.name:
nginx-fusioncore-deployment which collides with the same name in
tests/resources/nginx-user-network-deployment.yaml and causes one apply to
overwrite the other; fix by renaming the deployment in
nginx-known-network-deployment.yaml (change the metadata.name value, e.g., to
nginx-known-network-deployment) so the "known-network" and "user-network"
manifests have distinct names and won't clobber each other when applied in the
same namespace.

In `@tests/resources/nginx-user-network-deployment.yaml`:
- Around line 3-6: The deployment's metadata.name currently uses
"nginx-fusioncore-deployment" which collides with another manifest; rename
metadata.name to "nginx-user-network-deployment" in this file (update the value
for metadata.name) and verify any related selectors or references (e.g.,
labels/selectors) still match the new name so resources remain consistent.

---

Duplicate comments:
In `@pkg/objectcache/networkneighborhoodcache/networkneighborhoodcache.go`:
- Around line 208-213: The loop currently skips user-defined
NetworkNeighborhoods via the continue in the
nnc.workloadHasUserDefinedNetwork(workloadID) branch, causing write-once
behavior and stale workloadIDToProfileState; remove the early continue and
instead, when nnc.workloadHasUserDefinedNetwork(workloadID) is true, call
nnc.GetNetworkNeighborhood(nameOrID) to refresh/read the authoritative
user-defined NN and use that result to update workloadIDToProfileState (avoiding
overwriting it with the auto-learned object), and only skip applying
auto-learned values if the fetched user-defined NN exists; ensure the check
happens before assigning the auto-learned NN into workloadIDToProfileState so
state stays consistent with the user-defined cache.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b335fa99-2461-41d7-9305-5c47926652b7

📥 Commits

Reviewing files that changed from the base of the PR and between 7f06cc9 and d1f0b75.

📒 Files selected for processing (16)
  • pkg/containerprofilemanager/v1/lifecycle.go
  • pkg/objectcache/networkneighborhoodcache/networkneighborhoodcache.go
  • pkg/objectcache/shared_container_data.go
  • tests/chart/templates/node-agent/default-rules.yaml
  • tests/component_test.go
  • tests/resources/curl-plain-deployment.yaml
  • tests/resources/curl-user-network-deployment.yaml
  • tests/resources/curl-user-profile-wildcards-deployment.yaml
  • tests/resources/debug-learn-nn.sh
  • tests/resources/known-application-profile-wildcards.yaml
  • tests/resources/known-application-profile.yaml
  • tests/resources/known-network-neighborhood.yaml
  • tests/resources/nginx-both-user-defined-deployment.yaml
  • tests/resources/nginx-known-network-deployment.yaml
  • tests/resources/nginx-user-defined-deployment.yaml
  • tests/resources/nginx-user-network-deployment.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/resources/known-application-profile-wildcards.yaml

Comment on lines +1681 to +1690
// hasAlert checks whether an R0002 alert exists for comm=cat, container=nginx.
hasAlert := func(alerts []testutils.Alert) bool {
for _, a := range alerts {
if a.Labels["rule_name"] == ruleName &&
a.Labels["comm"] == "cat" &&
a.Labels["container_name"] == "nginx" {
return true
}
}
return false
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The R0002 matcher is too coarse for these wildcard subtests.

hasAlert reduces every R0002 from cat in nginx to a single boolean, while the fixtures are not consistent about the extra files cat opens at startup — only the exact-match case whitelists /etc/ld.so.cache on Line 1740. That means the "expect NO alert" cases can fail on unrelated opens, and the "expect alert" cases can pass without proving the path matcher itself. Filter on the accessed path if the alert exposes it, or seed the same baseline opens in every fixture before asserting wildcard behavior.

Also applies to: 1738-1740

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/component_test.go` around lines 1681 - 1690, The hasAlert helper
(hasAlert(alerts []testutils.Alert)) is too coarse for R0002 wildcard subtests
because it only matches rule_name, comm and container_name; update the test to
either (A) filter alerts by the actual accessed path exposed on the alert (e.g.,
check alert.Labels["path"] or alert.Annotations["path"] together with
ruleName/comm/container_name inside hasAlert) so each subtest asserts the
specific opened file, or (B) seed every fixture with the same baseline opens
(the startup files such as /etc/ld.so.cache) before asserting wildcard behavior
so unrelated startup opens don’t affect the boolean result; modify the hasAlert
usage and the fixtures accordingly to ensure each subtest validates the exact
path matcher rather than collapsing to a single boolean.

Comment on lines +3 to +6
metadata:
labels:
app: curl-fusioncore-28-1
name: curl-fusioncore-deployment
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Deployment name collision with curl-plain-deployment.yaml.

Both this file and tests/resources/curl-plain-deployment.yaml define metadata.name: curl-fusioncore-deployment. While they use different app labels (curl-fusioncore-28-1 vs curl-fusioncore-28-0), applying both to the same namespace will cause the second to overwrite the first.

Consider renaming to curl-user-network-deployment to match the filename and avoid collision.

🔧 Proposed fix
 metadata:
   labels:
     app: curl-fusioncore-28-1
-  name: curl-fusioncore-deployment
+  name: curl-user-network-deployment
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
metadata:
labels:
app: curl-fusioncore-28-1
name: curl-fusioncore-deployment
metadata:
labels:
app: curl-fusioncore-28-1
name: curl-user-network-deployment
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/resources/curl-user-network-deployment.yaml` around lines 3 - 6, The
deployment's metadata.name "curl-fusioncore-deployment" collides with the one in
the other manifest; update the metadata.name in this manifest from
"curl-fusioncore-deployment" to "curl-user-network-deployment" so it matches the
filename and avoids overwriting the other Deployment when applied (verify you
only change the metadata.name field and leave the app label
"curl-fusioncore-28-1" unchanged).

Comment on lines +15 to +17
metadata:
name: fusioncore-profile
namespace: "{{NAMESPACE}}"
Copy link

@coderabbitai coderabbitai bot Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if this manifest is actually used in any tests or scripts
rg -l 'known-application-profile.yaml' --type=sh --type=go

Repository: kubescape/node-agent

Length of output: 46


🏁 Script executed:

# Check if the file exists and read its content
cat -n tests/resources/known-application-profile.yaml

Repository: kubescape/node-agent

Length of output: 9717


🏁 Script executed:

# Search for references to this file across all file types
rg 'known-application-profile' -i

Repository: kubescape/node-agent

Length of output: 501


🏁 Script executed:

# Find Test_28 to understand the annotation requirement
rg 'Test_28' -A 20 -B 5 | head -100

Repository: kubescape/node-agent

Length of output: 5725


Add missing annotations to match Test_28_UserDefinedNetworkNeighborhood and the parallel NetworkNeighborhood manifest.

The metadata lacks annotations present in both the inline ApplicationProfile created in Test_28_UserDefinedNetworkNeighborhood and the companion known-network-neighborhood.yaml file. These annotations are required for the node-agent to properly recognize user-defined profiles:

Add annotations block after namespace
metadata:
  name: fusioncore-profile
  namespace: "{{NAMESPACE}}"
  annotations:
    kubescape.io/managed-by: User
    kubescape.io/status: completed
    kubescape.io/completion: complete
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/resources/known-application-profile.yaml` around lines 15 - 17, The
metadata block for the ApplicationProfile "fusioncore-profile" is missing the
annotations used by Test_28_UserDefinedNetworkNeighborhood and
known-network-neighborhood.yaml; update the metadata for the resource named
fusioncore-profile to include an annotations section with keys
kubescape.io/managed-by: User, kubescape.io/status: completed, and
kubescape.io/completion: complete (insert the annotations block immediately
after namespace in the metadata for the ApplicationProfile manifest so the
node-agent will recognize this user-defined profile).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

told you already that APs dont need annotations, but thanks for finding that the inline test file has them, which it shouldnt.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@entlein, understood — APs don't require those annotations. Thanks for the correction!

Glad the comment was still useful in surfacing that the inline AP in Test_28_UserDefinedNetworkNeighborhood has spurious annotations. Would you like me to generate a fix to remove them from there? If so, let me know and I'll put together the diff.


✏️ Learnings added
Learnt from: entlein
Repo: kubescape/node-agent PR: 748
File: tests/resources/known-application-profile.yaml:15-17
Timestamp: 2026-03-13T12:25:33.349Z
Learning: In the kubescape/node-agent repository, ApplicationProfile (AP) resources do NOT require annotations such as `kubescape.io/managed-by: User`, `kubescape.io/status: completed`, or `kubescape.io/completion: complete`. These annotations are used for NetworkNeighborhood (NN) resources, not APs.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Comment on lines +31 to +33
spec:
matchLabels:
app: curl-fusioncore-28-1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Find and read the YAML manifest file
find . -name "known-network-neighborhood.yaml" -type f

Repository: kubescape/node-agent

Length of output: 113


🏁 Script executed:

# Search for NetworkNeighborhoodSpec definition
rg "type NetworkNeighborhoodSpec" -A 20

Repository: kubescape/node-agent

Length of output: 46


🏁 Script executed:

# Check the test file at the mentioned lines
if [ -f "tests/component_test.go" ]; then
  sed -n '2004,2007p' tests/component_test.go
fi

Repository: kubescape/node-agent

Length of output: 204


🏁 Script executed:

# Search for any CRD definitions related to NetworkNeighborhood
find . -name "*.yaml" -o -name "*.yml" | xargs grep -l "NetworkNeighborhood" 2>/dev/null | head -10

Repository: kubescape/node-agent

Length of output: 154


🏁 Script executed:

cat -n ./tests/resources/known-network-neighborhood.yaml

Repository: kubescape/node-agent

Length of output: 1837


🏁 Script executed:

# Search for where NetworkNeighborhoodSpec comes from
rg "import.*v1beta1|from.*v1beta1" tests/component_test.go -B 5 -A 2 | head -30

Repository: kubescape/node-agent

Length of output: 46


🏁 Script executed:

# Check imports in component_test.go to understand the package
head -50 tests/component_test.go | grep -E "^import|v1beta1"

Repository: kubescape/node-agent

Length of output: 259


🏁 Script executed:

# Search for NetworkNeighborhood struct in the repo
rg "type NetworkNeighborhood " -A 10

Repository: kubescape/node-agent

Length of output: 46


🏁 Script executed:

# Search for CRD definitions or examples in workflow files
rg "labelSelector" tests/ --type yaml -B 2 -A 2

Repository: kubescape/node-agent

Length of output: 46


🏁 Script executed:

# Check go.mod to understand the external dependency version
cat go.mod | grep storage

Repository: kubescape/node-agent

Length of output: 151


🏁 Script executed:

# Look for any other YAML examples that might show the correct structure
rg "spec:" tests/resources/*.yaml -A 5 | head -50

Repository: kubescape/node-agent

Length of output: 2249


Use spec.labelSelector.matchLabels here, not spec.matchLabels.

The test code at lines 2004-2007 constructs v1beta1.NetworkNeighborhoodSpec with LabelSelector: metav1.LabelSelector{MatchLabels: ...}, showing the field nesting required by the API. The current YAML structure at lines 31-33 is missing the intermediate labelSelector level and will not match the schema, causing validation to fail or the field to be ignored.

Suggested fix
 spec:
-  matchLabels:
-    app: curl-fusioncore-28-1
+  labelSelector:
+    matchLabels:
+      app: curl-fusioncore-28-1
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
spec:
matchLabels:
app: curl-fusioncore-28-1
spec:
labelSelector:
matchLabels:
app: curl-fusioncore-28-1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/resources/known-network-neighborhood.yaml` around lines 31 - 33, The
YAML uses spec.matchLabels but the API expects spec.labelSelector.matchLabels;
change the spec block to nest matchLabels under labelSelector so it matches
v1beta1.NetworkNeighborhoodSpec (i.e., use labelSelector: with MatchLabels like
in metav1.LabelSelector) ensuring the field name matches the LabelSelector
structure used by the code that constructs v1beta1.NetworkNeighborhoodSpec.

Comment on lines +3 to +6
metadata:
labels:
app: nginx-fusioncore
name: nginx-fusioncore-deployment
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Deployment name collision with nginx-user-network-deployment.yaml.

Both this file and tests/resources/nginx-user-network-deployment.yaml define metadata.name: nginx-fusioncore-deployment. If both are applied to the same namespace, the second apply will silently overwrite the first, invalidating tests that need to exercise both "known-network" (without user-defined label) and "user-network" (with kubescape.io/user-defined-network label) code paths.

Consider renaming this deployment to nginx-known-network-deployment or similar to avoid collision.

🔧 Proposed fix
 metadata:
   labels:
     app: nginx-fusioncore
-  name: nginx-fusioncore-deployment
+  name: nginx-known-network-deployment
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
metadata:
labels:
app: nginx-fusioncore
name: nginx-fusioncore-deployment
metadata:
labels:
app: nginx-fusioncore
name: nginx-known-network-deployment
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/resources/nginx-known-network-deployment.yaml` around lines 3 - 6, The
deployment in tests/resources/nginx-known-network-deployment.yaml uses
metadata.name: nginx-fusioncore-deployment which collides with the same name in
tests/resources/nginx-user-network-deployment.yaml and causes one apply to
overwrite the other; fix by renaming the deployment in
nginx-known-network-deployment.yaml (change the metadata.name value, e.g., to
nginx-known-network-deployment) so the "known-network" and "user-network"
manifests have distinct names and won't clobber each other when applied in the
same namespace.

Comment on lines +3 to +6
metadata:
labels:
app: nginx-fusioncore
name: nginx-fusioncore-deployment
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Deployment name collision — rename to match filename.

As noted in the review of nginx-known-network-deployment.yaml, this file shares the same metadata.name: nginx-fusioncore-deployment. Consider renaming to nginx-user-network-deployment to avoid collision and match the filename.

🔧 Proposed fix
 metadata:
   labels:
     app: nginx-fusioncore
-  name: nginx-fusioncore-deployment
+  name: nginx-user-network-deployment
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
metadata:
labels:
app: nginx-fusioncore
name: nginx-fusioncore-deployment
metadata:
labels:
app: nginx-fusioncore
name: nginx-user-network-deployment
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/resources/nginx-user-network-deployment.yaml` around lines 3 - 6, The
deployment's metadata.name currently uses "nginx-fusioncore-deployment" which
collides with another manifest; rename metadata.name to
"nginx-user-network-deployment" in this file (update the value for
metadata.name) and verify any related selectors or references (e.g.,
labels/selectors) still match the new name so resources remain consistent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: WIP

Development

Successfully merging this pull request may close these issues.

2 participants